Fine keyword clustering using a thesaurus and example sentences for speech translation
نویسندگان
چکیده
For robust speech translation, we propose a new language translation method in which speech recognition results are mapped to example sentences using keywords. In this method, the keyword clustering is used to cope with recognition errors and the wide variety of words that do not appear in the training corpus. Initial classes defined using only thesaurus are redefined by using the dependency between the keywords in limited number of example sentences. The effectiveness of our keyword clustering method is confirmed through example sentence search experiments. These experiments were done using keyword sets of (a) different sentences including keywords not in the example sentences and (b) recognition results those sentences in which recognition errors were obtained. Compared with the search method which uses keyword sets defined by using only a thesaurus, our proposed method offered improved search error rates.
منابع مشابه
An Experimental Multilingual Bi-directional Speech Translation System
We describe an experimental Multilingual Bi-directional speech translation system utilizing small, PC-based hardware with multi-modal user interface. Two major problems for people using an automatic speech translation device are speech recognition errors and language translation errors. We focus on developing techniques to overcome these problems. The techniques include a new language translati...
متن کاملRecent Advances in Example - Based Machine Translation
This book, an outcome of a 2001 workshop on Example-Based Machine Translation (EBMT) in Santiago de Compostela, very appropriately starts with a preface by professor Makoto Nagao in which he explains how the limits of rule-based Machine Translation (MT) led him to propose his translation by analogy principle in 1981 (published as Nagao, 1984). His idea, inspired by second language learning meth...
متن کاملCLEF-2005 CL-SR at Maryland: Document and Query Expansion using Side Collections and Thesauri
This paper reports results for the University of Maryland’s participation in CLEF-2005 Cross-Language Speech Retrieval track. Techniques that were tried include: (1) document expansion with manually created metadata (thesaurus keywords and segment summaries) from a large side collection, (2) query refinement with pseudo-relevance feedback, (3) keyword expansion with thesaurus synonyms, and (4) ...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملA Part-of-Speech-Based Search Algorithm for Translation Memories
The retrieval of related sentences in state-of-the-art translation memory systems is based on orthographic similarities. This often leads to poor search results, since orthographically similar sentences are not necessarily semantically related. In this paper we propose a search algorithm that aims to reduce this problem by taking part-of-speech information into account. It requires that the par...
متن کامل